title: “Assessing and improving measurement of ESCS in PISA 2018”
subtitle: “Markup week 2 exercise 2”
author:
name: Kirsten van Kessel
orcid: 0009-0009-1265-8516
email: k.d.vankessel@students.uu.nl
format:
revealjs:
theme: simple
slide-number: true
logo: "logo.png"
scrollable: true
editor: visual
bibliography: references.bib
tbl-cap-location: bottom
Equal weights:
Pros: Stable over years, simple
Cons: Performs worse -> Measurement invariance
Regression imputation:
Pros: Simple
Cons: Outdated, performs worse
$$
F = \sum_p \sum_{g_m<g_n} w_{g_m.g_n} f(\lambda_{pg_m} - \lambda_{pg_n}) + \sum_p \sum_{g_m<g_n} w_{g_m.g_n} f(\tau_{pg_m} - \tau_{pg_n})
$$
in which p is the number of indicators, \(g_m\) and \(g_n\) indicate country m and n for every pair of countries in the data, \(\lambda_{pg_m}\) and \(\lambda_{pg_n}\) indicate the factor loadings of country m and n, and \(\tau_{pg_m}\) and \(\tau_{pg_n}\) indicate the intercepts of country m and n. Within the total loss function \(F\) the component loss function \(f\) scales the difference between the parameters for every pair of countries and for every measurement parameter:
$$
f(x) = \sqrt{\sqrt{x^2 + 0.01}}
$$
\(w_{g_m.g_n}\) is a weight that represents the size of the countries. It is defined as:
$$
w_{g_m.g_n} = \sqrt{N_{g_n} N_{g_m}}
$$
in which \(N_{g_n}\) and \(N_{g_m}\) are the number of participants from country n and m.
# drop the items where a score is missing
data_homepos_long_complete <- data_homepos_long %>% drop_na(item_score)
# estimate the parameters
parms <- dexterMML::fit_2pl(data_homepos_long_complete %>%
dplyr::select(person_id, item_id, item_score))
coef(parms)
# calculate theta: the ability estimate
theta <- ability.mml(data_homepos_long_complete, parms, method = "WLE")
|——————|——–|———|——–|——-|———|——|
# load library
library(ggplot2, warn.conflicts = FALSE)
## Warning: package 'ggplot2' was built under R version 4.2.2
library(plotly, warn.conflicts = FALSE)
## Warning: package 'plotly' was built under R version 4.2.3
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.2.2
# load data
data <- readRDS("recreation_escs_data.rds")
# drop missings
data_compare <- data %>% drop_na()
# select few countries
data_compare <- data_compare %>% filter(data.CNT == "ALB")
# make plot
HOMEPOS_plot <- ggplot(data_compare, aes(x = data.HOMEPOS, y = data.HOMEPOS_theta)) + geom_point(alpha = 0.05) +
ggtitle("Comparing computed HOMEPOS to the PISA HOMEPOS") +
xlab("Computed HOMEPOS") + ylab("PISA HOMEPOS")
ggplotly(HOMEPOS_plot)